Wavelet scattering transform application in classification of retinal abnormalities using OCT images

To assist ophthalmologists in diagnosing retinal abnormalities, Computer Aided Diagnosis has played a significant role. In this paper, a particular Convolutional Neural Network based on Wavelet Scattering Transform (WST) is used to detect one to four retinal abnormalities from Optical Coherence Tomography (OCT) images. Predefined wavelet filters in this network decrease the computation complexity and processing time compared to deep learning methods. We use two layers of the WST network to obtain a direct and efficient model. WST generates a sparse representation of the images which is translation-invariant and stable concerning local deformations. Next, a Principal Component Analysis classifies the extracted features. We evaluate the model using four publicly available datasets to have a comprehensive comparison with the literature. The accuracies of classifying the OCT images of the OCTID dataset into two and five classes were \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$100\%$$\end{document}100% and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$82.5\%$$\end{document}82.5%, respectively. We achieved an accuracy of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$96.6\%$$\end{document}96.6% in detecting Diabetic Macular Edema from Normal ones using the TOPCON device-based dataset. Heidelberg and Duke datasets contain DME, Age-related Macular Degeneration, and Normal classes, in which we achieved accuracy of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$97.1\%$$\end{document}97.1% and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.4\%$$\end{document}94.4%, respectively. A comparison of our results with the state-of-the-art models shows that our model outperforms these models for some assessments or achieves nearly the best results reported so far while having a much smaller computational complexity.


Wavelet scattering transform application in classification of retinal abnormalities using OCT images
Zahra Baharlouei 1 , Hossein Rabbani 1* & Gerlind Plonka 2 To assist ophthalmologists in diagnosing retinal abnormalities, Computer Aided Diagnosis has played a significant role.In this paper, a particular Convolutional Neural Network based on Wavelet Scattering Transform (WST) is used to detect one to four retinal abnormalities from Optical Coherence Tomography (OCT) images.Predefined wavelet filters in this network decrease the computation complexity and processing time compared to deep learning methods.We use two layers of the WST network to obtain a direct and efficient model.WST generates a sparse representation of the images which is translation-invariant and stable concerning local deformations.Next, a Principal Component Analysis classifies the extracted features.We evaluate the model using four publicly available datasets to have a comprehensive comparison with the literature.The accuracies of classifying the OCT images of the OCTID dataset into two and five classes were 100% and 82.5% , respectively.We achieved an accuracy of 96.6% in detecting Diabetic Macular Edema from Normal ones using the TOPCON device- based dataset.Heidelberg and Duke datasets contain DME, Age-related Macular Degeneration, and Normal classes, in which we achieved accuracy of 97.1% and 94.4% , respectively.A comparison of our results with the state-of-the-art models shows that our model outperforms these models for some assessments or achieves nearly the best results reported so far while having a much smaller computational complexity.
The retina is the innermost layer in the eye that creates vision.Various diseases have been diagnosed in this sensitive part of the eye, which affect different layers of the retina.In Diabetic Retinopathy (DR); retinal blood vessels can leak or become blocked.Several changes, such as increasing the thickness of retinal layers, are seen in this abnormality.It is a serious cumulative vascular condition that damages retinal cells with no obvious visual symptoms at first but it can progress to a widespread and severe state, and the disease's progression can result in blindness 1 .The changes in DR involve the retinal microvasculature specifically the tight junctions of the endothelial cell wall 2 .Age-related Macular Degeneration (AMD) usually appears with thickness in the Retinal Pigment Epithelium (RPE) layer.AMD originates either from the choroid or, less frequently, from the retinal circulation.The leakage in the aberrant vessels results in fluid accumulation underneath the retina and leads to rapid visual loss 3 .AMD is categorized into three stages as early, intermediate, and late stages.Two ones are non-advanced stages with no fluid or atrophy.The advanced AMD is characterized by the advanced dry stage and advanced exudative stage 4 .Macular Hole (MH) lead to distorted or blurred vision, as well as a decrease in visual acuity.Thickened edges, fluid accumulation, and macular edema are signs of MH.An important factor in the development of MH is parafoveal vitreous detachment.Anteroposterior traction with parafoveal vitreous detachment may be involved in the onset and development of MH 5 .Central Serous Retinopathy (CSR) is an eye condition characterized by the accumulation of fluid under the retina in the central macular area.Leakage of fluid into the retina through an RPE defect is seen in CSR 6 .In this disease, dysfunctional retinal pigment epithelial cells and/or choroid lining the retina lead to the development of sub-retinal fluid 7 .
Retinal abnormalities are diagnosed through observation of the retinal images.Optical Coherence Tomography (OCT) is a widely accessible, non-invasive medical imaging technique that uses light to capture pictures at microscopic resolution from the retina 8 .Manual diagnosis of retinal abnormalities is costly and time-consuming and also requires highly trained clinicians to have precision.Early diagnosis of such pathologies can decrease The rest of the paper is organized as follows: First, we have a literature review in section "Related works".The section "Materials and Methods" introduces the datasets and describes the method.In section "Results" the experimental results are presented.In the section "Discussion", we summarize the results and analyze them.Section "Conclusion" summarizes the article.

Related works
The results of previous classification methods in the literature differ concerning dataset properties (such as the contrast of images, imaging system, noise level, size of dataset), the network depth, the generality of the algorithm, computational complexity, and processing time.Therefore, the methods cannot be easily compared 19 .For example 28 , achieved an accuracy of 88.4% using 2000 images from the EyePACS dataset, while 18 reported an accuracy of 97.93% , using a more complex network and 35,126 images from the same dataset.Authors in 29 , used a four layers Convolutional Neural Network (CNN), and reported accuracies of 87.83% using pre-processing and 81.8% without it.Some papers focus on diagnosing only one particular disease.In He et al. 30 , AMD was diagnosed from Normal cases using ResNet-50.The AUC of 0.99, Sensitivity of 95.02% , and Specificity of 95.02 were the reported results.Dry AMD (drusen) versus wet AMD was diagnosed from OCT images using FPN-VGG-16 which lead to 93.4% accuracy 31 .In An et al. 32 , AMD with fluid versus AMD without fluid using VGG-16 achieved to the accuracy of 95.1%.
Some other papers tried to diagnose two and more diseases using different methods and datasets.Rasti et al. 43 recognized AMD, DME, and Normal cases with an accuracy of 98.14% , using a multi-scale convolutional mixture of experts, while 44 diagnosed the same classes with an accuracy of 92.06% , using surrogate CNN.Using a wavelet-based CNN model, an accuracy of 98.67% was achieved for the three-class classification task in Kafieh et al. 25 .In Elmoufidi et al. 45 , Different stages of DR were detected using CNN.
In addition to OCT images, some datasets acquired by other imaging technologies such as Fundus and OCT Angiography (OCTA) are used in the papers.Fundus is preferred for vascular diseases [46][47][48][49] .Hacisoftaoglu 47 using smartphone based methods on some datasets with Fundus images achieved to 98.6% of accuracy.Using DCNN, 10-fold cross-validation, an accuracy of 99.28% was achieved in Shankar et al. 48.Some researchers evaluated their works using both OCT and Fundus images, e.g. 2,49.OCTA has recently attracted the attention of researchers.It's a non-invasive imaging technique used in ophthalmology to visualize the blood vessels in the retina and choroid (the vascular layer behind the retina).1][52][53] .
A review of the retinal diseases classification results shows that deep learning based methods mostly have higher performance than basic machine learning ones.Basic machine learning methods usually have higher rates.In Sandhu et al. 54 , the authors tried to reduce the image dimensions and improve the classification performance, using the feature bagging technique.They achieved an accuracy of 80% with low computational time.In Somasundaram and Ali 55 , by extracting wavelet features and using four classification methods, 82% accuracy was obtained.In some basic machine learning models, high accuracy was achieved using special pre-processing techniques.For example, in Ali 56 , a novel pre-processing method was proposed, different features were extracted, and five classification methods were implemented to achieve an average accuracy of 98.83% .Compared with 54 , improving the accuracy in Ali 56 was in return for increasing the processing time.Most CNN-based methods and specifically, DCNN models, achieved higher accuracy than others.For example 38,48,57 , achieved the best accuracy of 99.1% , 99.28% , and 99.73% , respectively in detecting DR grades using DCNN models.

Materials and method
In this work, we aim to diagnose retina diseases from OCT images.We use the Wavelet Scattering Transform (WST) to access a sparse representation of images.Next, we employ a PCA-based classifier to categorize the retina diseases into different classes.We test our model on different OCT datasets to verify the accuracy of the model.We use the OCTID dataset to show the relatively good accuracy of the model to detect diseases from a large number of classes and a small amount of training data.Finally, we also use some well-known datasets involving a different number of images in 2 or 3 classes to compare the accuracy with state-of-the-art models in the literature.The block diagram of the architecture is shown in Fig. 1.
In the rest of this section, we explain the used datasets, the method, and the classification in more detail.

OCT datasets
In this work, four open-access datasets of OCT images are used.In the following, we describe the details of the OCTID 23 , TOPCON 25 , Duke 27 , and Heidelberg 26 datasets.The OCTID dataset includes 572 OCT images that are categorized into five classes Normal, CSR, MH, AMD, and DR.Images have 586 × 879 pixel resolution and 2 mm scan length, which are obtained from a raster scan protocol using a Cirrus HD-OCT machine 23 .
The TOPCON dataset includes 57171 B-scans of DME and Normal images with 650 × 512 resolution obtained from the Topcon 1000 device in the Ophthalmology Dept., Feiz Hospital, Isfahan, Iran.
The Duke-Harvard-Michigan Heidelberg dataset contains 45 cases of AMD, DME, and Normal with a total of 3231 OCT images, which have 496 × 1024 resolution.
The dataset from the Heidelberg device was acquired at Noor Eye hospital in Tehran containing 50 Normal and DME, and 48 AMD cases with a total of 4254 OCT images.The resolution of images is 512 × 496.
A sample of the images in each class of these datasets is presented in Fig. 2 and the properties of the used datasets in this work are listed in Table 1.This method is computationally efficient that can handle large volumes of data.This makes it a good choice for applications where real-time processing is required.Our results show that only two layers in this network are sufficient to achieve very good classification results.
We feed 2-D OCT images, without any pre-processing, to a WST architecture.After transferring the OCT images to the sparse representation, a PCA-based classifier categorizes the retina diseases into different classes In the following we briefly summarize the WST approach in the continuous setting.
Let f (x) with x = (x 1 , x 2 ) T be the two-dimensional signal on a rectangular (image) domain ⊂ R 2 .In the first step, the image f is filtered by applying convolutions with the scaled Gaussian (low-pass) function φ and a scaled and rotated (band-pass) wavelet function ψ .Then we take the modulus of these convolutions and apply a localized averaging by convolution with the scaled Gaussian φ .As in Bruna and Mallat 22 , let where φ(x) := 1 2πσ 2 exp(−|x| 2 /2σ 2 ) is the two-dimensional Gaussian window function with σ = 0.85 .Then is the zeroth order scattering coefficient representing the low-pass part of f.Next, we consider the two-dimensional Morlet wavelet where c 1 is a normalization factor and c 2 is chosen such that R 2 ψ(x) dx = 0 .In other words, ψ(x) is the differ- ence between a plane wave and a constant, localized by the Gaussian window φ(x) , and can be interpreted as a band-pass filter.Further, let Ŵ := {0, π r , 2π r , . . ., (r−1)π r } be a fixed set of r equidistant rotation angles in [0, π) where we usually set r = 12 in our experiments.Then the scaled and rotated wavelet functions are determined by where R γ = cos γ sin γ − sin γ cos γ denotes the rotation matrix corresponding to γ ∈ Ŵ .The vector of scattering coefficients of the first order is now given by Indeed, the Employing the convolution with a wide Gaussian window φ J gives a similar result, i.e., we have almost translation invariance, i.e., we have S 1,J f (x + τ ) ≈ S 1,J f (x) if the components of τ are small enough.The scattering coefficients of the first order are equivalent to the feature vector obtained in the Scale-Invariant Feature Transform (SIFT), a locally invariant image descriptor proposed in Lowe 42 .The convolution of |f * ψ j 1 ,γ 1 (x)| with the Gaussian window φ(x) is a low-pass filtering procedure that causes an information loss.To achieve improved high-frequency information, the vector of scattering coefficients of the second order is computed as More translation-invariant scattering coefficients can be computed by iterating this procedure, and the energy of the image signal f is propagated across the scattering coefficients.As has been shown in Bruna and Mallat 60 , the scattering coefficients of order 0 to 2 in contain usually already more than 98 % of the energy of f.Thus we use only the coefficients in layers 0, 1, 2, which reduces the computational complexity significantly.Figure 3 shows the WST with m = 2 used in this work.Observe that in the considered continuous setting the image f as well as all scattering coefficients are still functions on .We set the dimension of the scaling filter, called invariant scale, equal to the minimum dimension of the images for each dataset used in this paper.In practice, we have a given discrete image f with N pixels and the convolutions have to be discretized.The total number of scattering coefficients in S 1,J is Jr and the number of scattering coefficients in S 2,J is r 2 J(J−1) 2 , where r is the number of considered angles.These functions www.nature.com/scientificreports/are uniformly sampled with grid size 2 J such that each discretized scattering coefficient has 2 −2J N coefficients, where N is the number of pixels of the image f .Together, the total number of the discrete feature vectors S J f of f (consisting of components of all feature coefficients of order 0, 1, and 2) is then

Classifier
We employ a classifier based on PCA applied to a suitable affine space, as proposed in Bruna and Mallat 22 .The classification algorithm directly employs the scattering coefficient vectors S J f .Assume that we have computed a complete vector of scattering coefficients of length N J that contains the scattering coefficients of f of order 0, 1 , and 2 at subsampled positions.Each signal class is represented by a random vector f k , and the realizations of this random vector are the images f in this class.Let E(S J f k ) denote the expected scattering coefficient vector of length N J of images f in class k.Further, let V d,k be the rank-d approximation of the covariance matrix of S J f k of size N J × N J built by the eigenvectors of the covariance matrix corresponding to the largest d eigenvalues.In our experiments, we have used d = 30 .We obtain the affine approximation space see also 22 .Having found this affine space, the classifier associates an image f to the class k (among K classes) if where P A d,k denotes the projection onto the affine space A d,k .The computational effort for the classification is governed by the required singular value decomposition of the covariance matrix of S J f k with O(N 3 J ) floating point operations.

Results
To assess the model, classified the OCT images of the OCTID, TOPCON, Duke, and Heidelberg datasets.These datasets differ in technologies, the number of images and their dimensions, and also the number of classes.The wavelet scattering features are extracted, and a PCA-based classifier is used to diagnose the retinal abnormalities.In this work, a wavelet scattering transform in Matlab was implemented.As mentioned in the Method Section, the energy of signals is significantly decreased as the layers are increased.Using two layers of wavelet filter banks is sufficient for classifying OCT images.For each wavelet filter, different rotations from 6 to 12 in [0, π] were considered.The best results were related to 12 rotations for all datasets except for OCTID, in which increasing the rotations number did not have any effect on the results.The spatial support in the row and column dimensions of the scaling filter was considered as half of the minimum dimension of the images for each dataset.To train the network, we used 80% of the data, and the rest of 20% was used to test.We tested our model to investigate the accuracy of diagnosing five categories in the OCTID dataset.The result is shown in Fig. 4. The accuracy of this classification is 82.5% .Only one work in the literature reported the clas- sification results for five classes in OCTID 61 .In Mishra et al. 61 , the accuracy of %93.12(+/ − 8.59) was reported using a CNN model.The model includes 13 convolution layers, 4 Maxpool layers, three fully connected layers, an attention module, and reshape, normalization, flatten softmax, and lost steps.Comparing the process steps and network layers in Mishra et al. 61 with our model shows the trade-off between computational complexity and processing time with accuracy.This is other than the shortages in using black-box CNN models.www.nature.com/scientificreports/Most of the classification works on the OCTID dataset investigated the detection accuracy for two classes, one abnormality from Normal ones.We examined our model for detecting DR pathology, which is one of the most common diseases in diabetic patients.Figure 5 shows that our method achieved 100% of accuracy for DR detection.Table 2 compares our result with other works in detecting DR.As seen in the table, this model outperforms other state-of-the-art models.
Next, we tested our model on the TOPCON dataset.An accuracy of 96.6% was achieved in detecting DME from normal ones, as seen in Fig. 6.We listed the best results that have been reported in the literature in Table 3, to compare the results with other works.As seen in the table, most CNN-based works achieved a higher accuracy.The WST-based model in this paper can achieve accuracy close to the complicated architecture of CNN-based models using a simple architecture.To compare the performance of the work with the research on other well-known datasets, we tested our model on the Duke and Heidelberg datasets to diagnose DME and AMD from Normal ones.We achieved the accuracy of 97.1% and 94.4% , respectively.The results are shown in Figs.7 and 8.
The best results reported in the literature on Duke and Heidelberg datasets are compared in Tables 4 and 5.The results show that we achieved the best accuracy in classifying on the Duke dataset.Since most of the works on the Duke dataset used k-fold cross-validation, we also implemented 10-fold validation to have a fair comparison.We achieved 96.7% of accuracy which is the best result reported in the literature and equal to the one in

Discussion
In this article, we used the WST-based method to diagnose retinal diseases from OCT images.We achieved different accuracies for the four databases used.Comparing the accuracy obtained in this method with other methods in Tables 2, 3, 4 and 5 shows that this method is generally comparable with state-of-the-art and highly accurate methods.As mentioned, the presented results are using two layers of the WST.We have shown the effect of using fewer layers on the results in Supplementary Appendix 1.In the appendix, we have also discussed the cause of failure cases in the classification.Among the advantages of this method over deep learning methods is short processing time.The computational cost for the WST only depends on the input size of the image, the chosen predefined scale 2 J and the number of angles r, and can be given as O(N J log(N)) for an image with N pixels.This means, the effort to perform the WST is even smaller than the needed cost to compute the low rank approximation of the correlation matrix of size N J × N J for classification.
In Table 6 we report all the obtained accuracies in this paper.Considering accuracy, our method outperforms previous research in DR detection using the OCTID dataset (with a very small amount of data) and on the Duke dataset.In other cases, the accuracy of our method is not much different from the best results obtained.
We also calculated AUC (Area under the ROC Curve).According to Table 6, our method has the best AUC on the Duke dataset, but this result is lower compared to previous research reports, which mostly reached an AUC above 0.9.
Using ANOVA statistical testing, we calculated the P-value for the experiments.The best results were achieved in the experiment performed on the OCTID dataset with five classes and on the TOPCON dataset, as seen in the table.

Conclusions
Various retinal diseases can be diagnosed using OCT images.To overcome some shortages in manual diagnosing, such as mistakes and costs, computer-aided manners have been considered today.Various classical machine learning and deep learning methods have been proposed in this field.Although deep learning techniques, specifically CNN-based methods, can achieve high accuracies in detecting different abnormalities, some shortages make them often impractical.Application problems in practice include the high computation complexity, long processing time, requirement of large datasets, and unclear interpretability.
In this paper, we implemented the wavelet scattering network to diagnose retinal abnormalities using OCT images.This transformation overcomes some mentioned shortages of CNN methods.In particular, the CNN of  www.nature.com/scientificreports/ the WST is based on predefined wavelet filters.Employing only two layers of the WST, we achieved an efficient model with low computational complexity.This is the first time that WST was used on OCT images.In previous research, WST-based methods have been proposed for the classification of EEG and ECG signals, and in most cases, good results have been achieved compared to other methods.In this article, using this method and without pre-processing, we categorized retinal diseases using several OCT databases to obtain an evaluation of the different numbers of image classes, technologies, and sizes of images.We performed a comprehensive assessment and comparison of the method.
The accuracies of classifying the OCT images of the OCTID dataset into five and two classes were 82.5% and 100% , respectively.We achieved an accuracy of 96.6% in diagnosing DME from Normal ones using the TOPCON device-based dataset.The Heidelberg and the Duke datasets contain DME, AMD, and Normal classes, where we achieved 97.1% and 94.4% , respectively.
Comparing our results with the state-of-the-art models in the literature shows that this model outperforms the compared models in detecting DR in the OCTID and the Duke dataset with three classes.In other cases, our results are comparable with other works, specifically with CNN-based techniques.An acceptable decrease in accuracy of some assessments was seen comparing the best results that have been reported in the literature, in return for an essential decrease of the computational complexity and processing time which are essential factors in practice.
Although the classification results with this method are generally good, it still needs to be improved.In future works, we aim to upgrade the method by finding more proper wavelet filters that are particularly adapted to the special features of OCT images and which can increase the performance of diagnosing retinal disease.We also examine the effectiveness of this method to detect real samples.

Figure 1 .
Figure 1.Block diagram of the model.

Figure 2 .
Figure 2. A sample of the OCT images in the datasets.

Figure 3 .
Figure 3.The wavelet scattering network with two layers.

Figure 4 .Figure 5 .
Figure 4.The confusion matrix of WST on OCTID dataset for diagnosing five classes of OCT images.

Figure 6 .
Figure 6.The confusion matrix of WST on the TOPCON dataset for diagnosing DME from Normal cases.

Figure 8 .
Figure 8.The confusion matrix of WST on the Heidelberg dataset.

Table 1 .
OCT datasets used in this work.We want to use a model with low computational cost and a high classification rate to be implementable in practice for medical tasks.In this model, we use WST to extract the important image features.Unlike deep learning models, the WST can be easily interpreted.The scattering coefficients at each scale and orientation capture different levels of signal information which are crucial for better classification.The WST is designed to be invariant to deformations, rotations, and translations, making it robust to variations in the input signal.This is particularly important in medical imaging applications where the position and orientation of the part being imaged can vary.Moreover, it preserves high-frequency information.The WST requires fewer training examples than deep learning methods, making it a good choice for applications where labeled data is limited.We need a much smaller amount of training data to achieve clear discrimination of up to five classes.

Table 2 .
Comparing DR detection accuracy in different works.

Table 3 .
33mparing DME detection accuracy in different works.Thomas et al.33.The classification accuracy of this work on the Heidelberg dataset is close to the best results in the literature but less than some.An overall view of the results on different datasets shows that this model achieves similarly good classification results as the other state-of-the-art models, specifically the CNN-based ones.
The confusion matrix of WST on the Duke dataset.

Table 4 .
Comparing DME, AMD, and Normal detection accuracy in different works using the Duke dataset.

Table 5 .
Comparing DME, AMD, and Normal detection accuracy in different works using the Heidelberg dataset.

Table 6 .
The experimental results of using the WST on four OCT datasets.